20
Data quality diagnostic criteria
1 except for pre-agreed cases
2 optional criterion for organizing data in Vertica or DB2
"Satisfactory"
"Good"
“Poor"
â–ª Table refers to clear
directories
â–ª There is a unique key
â–ª Data are stored in a big
table, no directories
available
â–ª Key is not available
Normalization
2
â–ª Number of entries per
month from the start of
data acquisition deviates
by less than 50% from
median
1
â–ª Number of entries
deviates from the mean
by more than 50% in at
least one of the periods
â–ª Number of entries
deviates from the mean
more than 2 times in at
least one of the periods
Time
completeness
â–ª No outliers (>500% of the
median)
1
â–ª More than 1% of outliers
with a delta of more than
500% of the median
Correctness
â–ª Values are presented fully
and sufficiently (filled-in for
90% and above)
â–ª Insignificant gaps (<30%)
in at least one attribute
â–ª >30% of gaps in at least
one attribute
Quality
SOURCE: Digital McKinsey - Building best-in-class Data Management Architecture